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Abstract 

Approximation of noisy data in the plane by straight lines or elliptic 
or single-branch hyperbolic curve segments arises in pattern recognition, data 
compaction, and other problems. A number of questions concerning the efficient 
search fo-. and approximation of data by such curves are examined. Recursive 
least-squares linear curve-fitting is used, and ellipses and hyperbolas are 
parameterized as quadratic functions in x and y. The error minimized by the 
algorithm is interpreted, cpu times for estimating parameters for fitting 
stra . V lines and quadratic curves are determined and compared, cpu time for 
data search is also determined for the case of straight line fitting. Ouadratic 
curve fating is shown to require about six times as much cpu time as does 
straight line fitting. Curves relating cpu time and fitting error are determined 
for straight line fitting. Lastly, results are derived cn early sequential de- 
termination of whether or not the underlying curve is a straight line. 
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Introduction 


One of the fundamental approaches to pattern recognition in general and 
to recognition of line drawings in particular is a statistical theoretic approach. 
Though other, usually heuristic, approaches are also taken, if one is interested 
in classification with controlled probability of error or with minimum computation 
cost, then a probabilistic approach is necessary. The question of minimizing epu 
time for recognition, is of course, of great importance if one wishes to treat the 
recognition of highly complicated patterns and is a subject of growing interest 
[1-5]. In this paper we focus on a subproblem of the more general problem of 
optimally recognizing complicated line drawings, namely, the problem of recognizing 
underlying straight lines and quadratic arcs in line drawings. Before getting down 
to the problem at hand, we briefly comment on an approach to recognition of com- 
licated line pictures as this will provide motivation for some of the topics we 
examine. However, il; should be emphasized that our interest in approximating data 
by straight lines and quadratic curves is for many other applications as well, 
e.g., data compression in picture boundary representation, contour' line represen- 
tation in maps, etc. The methods we discuss are geared to handling very noisy 
data. . 

We assume, a model for* data generation in which there is an underlying 
picture, in two dimensions, consisting of straight lines and curves and a datum 
is a noisy perturbation of a point on one of the lines. If, e.g., the class of 
pictures under consideration is a class of handprinted capital letters , an 
appropriate perturbation model might be a second order stochastic process with 
the perturbation perpstH’icular to the underlying line and a correlation interval 
roughly the length of the underlying line. We assume the two-dimensional field 
of . the picture is divided into cells, i.e. , quantized in the x and y directions, 
and that the picture gray level is hard quantized. Hence, the data available to 


the computer is a rectangular array of l's and 0's — a 1 if an appreciable por- 
tion of a cell is filled by the picture. A probabilistic description of the data 
' is then given by a distribution for the underlying straight lines and curves and 
a distribution for the noisy perturbations of this picture. It is then in theory 
possible to construct a Bayes. decision rule (minimum probability of misclassifi- 
cation decision rule) for classifying the data as representing a specific member 
of a finite number of families of pictures. This or the maximum a posteriori 
probability decision function v?ill be of interest to us in the problems we treat. 
The maximum a posteriori probability decision function is easier to use and is 
essentially equivalent in practice since radical quantization of some of the 
variables is necessary in practice to sufficiently simplify the problem probabil- 
ity structure in order to use either method. 

As an example of a number of considerations suppose hand-printed capital 
letters are the pattern categories of interest. What are the livelihood functions 
involved in letter recognition? Let be the ith data point and 

w = ^2 * ' ' * > x n »y n ) °^ serve< ^ data vector. There is a prior dis- 

tribution for the occurrence of the 26 capital letters. The maximum a posteriori 
decision rule is "choose the letter for which the joint likelihood of the letter 
and w is at least as large as the joint likelihoods of each of the other letters 
and w". The joint likelihood of H and w might be specified as follows. Let 
P(H) denote the a priori probability of the occurrence of H. We assume the data, 
when H is true, is a noisy perturbation of three straight lines. Since each line 
can be specified by 2 endpoints — 4 parameters — the three lines can be speci- 
fied by a parameter vector a of 12 components. Let p^(a) denote the probability 
density function for the a vector specifying possible underlying M's. Let 
f(w|a,H) be the likelihood of the data given H and a. Then denoting the likeli- 


hood of o and w given H by p(a,w|H), we define the joint maximum likelihood of 
H and w to be 

max P(H)p(a,w|H) 
a 

or 

max P(H)p(a|H)f(w|a,H) . (1) 

a 

If we assume that the noisy perturbations for a line are independent of those for 
the other lines, the perturbatf jn processes are the same for all lines, then (1) 
can be rewritten 

max P(H)p(a|H)f(w|a 1 )f(w|a 2 )f(w|a 3 ) . (2) 

Here*7^o, ,a_ ,o_ denote the parameter vectors for lines 1, 2, and 3 and we assume 
that for each i, the components (x^,y^ ) of w appear in only one of the three 
densities fCw)^), f(w|a 2 ), and f(w|a 3 ), specifically, the density which results 
in (2) being a maximum. If the f(w|cu) are Gaussian, and if, in addition, the 
data points are assumed to be independent perturbations of the underlying lines , 
then f(w|ctj) is simply (2ttc ) ■ J exp[-(l/2o^)q(w,aj )] where n^ is the number of 
data points appearing in f(w|a^.), and q(w,<jj) is the sum of the squares of the 
distances of these data points to the line determined by the parameters cu . Note 
that if p(a|H) were not present, (2) would be maximized by choosing the 's such 
that the lines they specify are the least squares fits to the associated data 
points in w. The presence of p(ojH) modifies the best a somewhat and also plays 
a role in determining the association of the data points in w and the 3 distribu- 
tions f<w|ou), j = 1,2,3. 

The problem with implementing (2) for a? 1 , 26 letters is that the computa- 
tion can become costly. Three important considerations arise when one attempts 
to reduce the amount of computations. First, how many data points should be used? 
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If the data points are assumed to be independent perturbations of underlying 
lines (or curves), the classification error will be a decreasing function of n, 
the number of data points. If the data points are not independent perturbations, 
classification error decreases and then levels off as n increases. Since compu- 
tation cost increases with increasing n, in general it will be desirable to use 
only a subset of the available data. Second, computing (2) for all 26 letters is 
very costly. Third, determining the distribution f(*|ou) with which a data point 
should be associated can be costly. This is related to the so called segmentation 
problem but in some aspects is more complicated if the recognizer processes only 
a subset of the data and hence does not exploit sequential continuity when it 
exists. The complexity of these problems has led researchers to turn to sequential 
recognition. Here it has been realized that by fitting lines and curves sequen- 
tially, it is possible to rule out various letters as being highly unlikely along 
the way, thus avoiding the second high cost referred to. Sequential recognition 
results in other benefits as well. 

We examine the preceding three considerations more carefully within the 
context of sequential recognition. We assume that recognition starts with the 

0,1 quantized data matrix and the goal then is low computational cost recognition. 

What is usually considered as preprocessing, e.g., smoothing and determination 
of data points constituting a curve through use of local connectivity and curva- 
ture, normalization, etc, usually involve processing all the data and are costly 
operations and ones we believe should be included in the total recognition cost 
and treated within the recognition cost minimization framework. The approach 
here, then, is sequential determination of the maximum likelihood function for 
the letter classes and the .data. Only a subset of the available data is processed 

in general. Under the assumption that the f(w|aj) are Gaussian, the choice of 


letter and parameters for which (2) is a ir,?ximum is equivalent to the problem of 
optimally sequentially fitting straight lines and parameterized arcs to subsets 
of the data, (i) Assume the recognition process is at a stage where a straight 
line or parameterized curve is to be fit to data entering a specific local region. 
The appropriate data subset is not yet known and must be found during the fitting 
process. Hence, influenced somewhat by the results of the fit up to the present 
stage, the system searches for data in the specified local region, gets a prelim- 
inary fit of a straight line or other parametrized curve to this data and then 
uses the fitted curve to estimate the location of the next data point to be used. 
Starting from the estimated location of the new data point, the nearest data 
point is found through a search, the parameters for a better fitting curve to the 
augmented data are found, and a now data point then looked for. Hence this 
provides a way of exploiting prior knowledge of pattern strv ;ture for intelligently 
searching for the data subset associated with an underlying parameterized curve. 

If the data perturbations are large , the cpu time consumed in data search when 
fitting straight lines to data can be comparable to the cpu tl^e in computing 
parameter values which result in the least squares straight line fit to the data. 
When the parameterized curve is a quadratic curve , the cost of the data search 
is less than the cost of computing appropriate parameter values. This problem 
of intelligent data search arises in a variety of contexts, e.g., in ballistic 
miss la decoy tracking [6]. (ii) There is a least expected cost order in which 
to search for the underlying parameterized curves. In otherwords, optimal sequen- 
tial parametrized curve fitting can be directed through a least expected cost 
decision tree. See [2D as an example of the type of minimum cost decision tree 
approach one can consider here. The reason for this dependency of recognition 
cpu time on order of estimation of the underlying curves is twofold. First, the 
a priori letter probabilities and hence probabilities of occurrence of various 


combinations of underlying parameterized curves vary. Second, the four major 
operations in curve fitting are: checking to see whether there is an underlying 
curve in the vicinity of a specific region of the space; fitting a straight line 
to data; checking to see which of a line and a quadratic curve best fits the data; 
fitting a higher order parameterized curve to data. These operations require 
different amounts of cpu time with the first requiring the least cpu time and 
the last requiring many times that amount. Hence, e.g., if recognition can bo 
accomplished by showing that the joint likelihood of a letter and the data w is 
unacceptably low for all 25 of the 26 letters, the recognition decision will be 
the remaining letter without actually computing the likelihood of this letter and 
its data. Thus, an optimum curve fitting sequence may avoid fitting costly 
quadratic or higher order curves and rely more for recognition on fitting straight 
lines, where they exist, checking on whether or not curves are present in specific 
regions, and fitting quadratic curves only when absolutely necessary. The optimum 
decision tree depends on the various probabilities involved and on the costs of 
performing various operations, (iii) Finally, how well does one estimate each 
underlying curve? Equivalently, how many data points do we take for estimating 
each underlying curve? Y/e have some choice here since determining the best 
possible fit to the underlying curves is usually not necessary for acceptable 
recognition error rate. On the other hand, it is usually true that the better 
the curve fit up to a stage, say the ith, the less uncertainty and hence less 
costly will be the fit at the following stage; also, the higher will be the prob- 
ability of correct underlying curve determination at the ith stage and hence the 
lower the probability for incorrect fitting and hence costly backtracking at the 
next or other future stages. Consequently, it behooves us to understand the 
relationship between goodness of curve fit and associated computational cost in 
order to determine how good a fit to compute at each stage. This problem of the 


relation among computation and resulting uncertainty at a given stage and the 
required computation at the next stage is important and arises in a variety of 
ways in a pattern recognition problem. See [5] for interesting treatment of the 
tradeoffs in the two stages of letter recognition and then word recognition. 

We Co not study the preceding three points within the context of a complex 
abstract pattern recognition problem sinne the application context has a bearing 
on how the -..hree points are best handled. However, we do provide results on the 
three points which are of use when considering a complete pattern recognition 
problem. Specifically, we discuss efficient algorithms for sequentially fitting 
straight lines and quadratic curves and for searching out the appropriate data. 

For straight lines and quadratic curves we estimate the cpu time for computing 
best parameter values, and for lines we also estimate cpu time for the data search. 
Thus, the relative data search and parameter estimation costs can be compared ar.d 
the relative estimation costs for straight lines and quadratic curves can be 
compared. For straight lines, we arrive at graphs for fit error as a function of 
computation time. Such design graphs can be prepared for quadratic curves as 
well. Finally, we discuss a variety of procedures for early decision on whether 
an under lying straight line or nonlinear function best fits the data. 



- B - 


Models 

We briefly describe the t 1 1 curve fitting models of interest in this paper. 
Both are linear regression models, (i) First, we are interested in the least 
squares fit of a straight line to n data points. The most reasonable error measure 
is undoubtedly the perpendicular distance from a data point to the line. However, 
results are simpler to obtain if we parametrize y as a function of x or x as a 
function of y depending on whether the slope of the fitted line, at that time, 
has magnitude less than or greater than 1. These parameterizations are 
y s c t c,x and x = c + c,y, respectively. Then for the first parameterization, 
the error for a given data point (x^ y^) is the magnitude of the difference in 
y^ and the height of the approximating line at x^; the error measure for the second 
Pw jmeterization is the obvious analog. In practice, one would start off with the 
most reasonable parameterization based on available information, and then retain 
this parameterization or change it depending on the data being processed. Since 
the sum of the squares of the errors in the best fit of this type is related by 
at most a factor of 2 to the corresponding statistic when the perpendicular 
ei^or from data point to line is used, it can be seen that, visually or by any 
reasonable criteria, the resulting line should be perfectly satisfactory. 

(ii) The second model of interest is the fitting of data by a quadratic curve. 

A portion of an ellipse is most useful for our purposes. The parameterization 
x(t) s x + a cos t, y(t) = y + b cos t + c sin t has many desirable features 
and variations have been used, e.g. , [73. This parameterization is probably the 
most convenient to use if successive values of t are chosen first and then appro- 
priate data points are searched for. That is, the approach is most useful if the 
search for an ith data point is conducted by calculating best estimates for x Q , 
y Q , a, b, and c based on the first i-1 data points used and then choosing t^ and 
searching for the data point closest, in the perpendicular direction, to the 


estimated ellipse at t^. However, if one is faced with the problem of fitting 
an ellipse to n specific data points, this parameterization is inconvenient since 
the appropriate value of t to b£n dissociated with each data point is not known and 
the fitting operation then involves some nonlinear programming. An alternative 
procedure is to ask for the coefficients for which the sum of the perpendicular 
distances squared from n data point to the curve 

CjX 2 + c 2 xy + c 3 y 2 + c 4 x + c 5 y - 1 = 0 (3) 

is a minimum. Again, the solution can be found but the required computation is 

considerably greater than if an alternative error measure, which we now describe, 

is used. Let c denote the vector (c^, c 2 , ..., Cg) and z denote the vector 
2 2. T T 

(x , xy, y , x, y) . Let z denote the z vector for the point (x^, y^). Then we 
seek the c vector for which 

l <cV - l) 2 (4) 

i=l 

is a minimum. This formulation can of course be used for a straight line, for 
curves of higher order or for other parameterizations as well. What is the curve 
resulting from miminization of (4)? The curve can be any one of an ellipse, a 
hyperbola, or a parabola. The chance of the last of these occurring is essen- 
tially zero. We would like to obtain an ellipse and in the section on experi- 
mental results we discuss ways of coaxing the algorithm to generate an ellipse or 

at least a useful hyperbola. 

T t 

For c fixed, c z is a function of (x,y), and, hence, w-c z - 0 is a sur- 
face in Euclidean w,x,y-space. The error measure (4) ic the sum of the squares 

T i T 

of the distances of the points h. H c z , on the surface w-c z - 0, to the plane 
parallel to and one unit above the x,y plane through the origin. See Fig. 1. 


The intersection of the surface w-c z - 0 with a piano parallel to the x,y piano 

through the origin is a quadratic curve c z + d = 0 or is empty — depending on d. 

Suppose t for example, that (3) is an ellipse. Then as d varies, the curves 

c z + d = 0 consitute a family of ellipses with common center, same major axes 

and minor axes, and such that the ratio of the distances of the curve from its 

center along the major and minor axes is the same for all d. An alternative and 

more useful interpretation of (4) is the following. Consider the ellipse and 

data v d g ( x d»y<i* T in Fie * 2 * Let V o S (x o ,y o^ T denote thc clli P se center and 

v = (x ,v ) T denote the intersection of the ellipse with the straight line from 
e e ,J e 

T 

v to v.. c z-1 can be rewritten as 
o a 

(v-v ) T C(v-v ) - vjev - 1 (5) 

o o o o 


where 


c = 

r c x -c i 
C 1 2 2 

1 

and v = -(1/2 )C ^ 
o 

w 

C c 


2 °2 C 3 

m m 


L 5 J 


Upon defining u to be v -v and 6 to be a scalar such that v.-v = u + Au , 
r ° e eo d o e e 7 

(5) becomes (v -v ) T C(v -v ) - v^Cv -1 + 6u T C6u + 26u T Cu , or 6(6+2)u Cu . 

go e o oo e e ee ee 

T T 

Finally, since u Cu -v Cv -1 = 0, (5) can be written as 
J ’ e e o o ■ 

«(«+2)p (6) 

where p is the constant 1+v^Cv q . Note that 1 6 j is the ratio 1 1 v^-v^ | | / 1 | u g 1 1 
and, hence, is the ratio of the distance of the data point to the ellipse, along 
the line from the data point to the ellipse center, divided by the distance from 
the ellipse to the ellipse center along this line. See Fig. 2. Since | |u | J is 
a maximum in the direction of the ellipse major axis and a minimum in the direction 
of the ellipse minor axis, it can be seen that if two data points are equidistant 



1 


from the ellipse but one lies along the ellipse major axis and the other along 
the ellipse minor axis, then tho contribution to (4) of the data point lying along 
the ellipse minor axis will be greater, assuming [ <5 1 « 1. Also, for |6| fixed, 
(6) has larger magnitude for 6 > 0 than for $ < 0. The difference in magnitudes 
is negligible for [ 6 | « 1, 


Recursive Estimation 

Estimation of the parameters of the models y = c Q +c^x and 
2 2 

CjX + c 2 xy + c 3 y + c^x + c g y - 1 = 0 is efficiently realized through use of 
recursive least squares estimation. Specifically, letting 


A r = [z 1 ! z 2 ! ... | z 11 ] 7 , and b = (y^y 2 ,. . . ,y n ) 7 

or b s (1,1,..., 1) , n l's, depending on whether we are dealing with tho first or 
the second model, respectively, the problem of determining the optimum c is that of 
determining the c which minimizes llA^c - b|| 2 . The solution from [8] or C9] is 

c = A*b (7) 

+ T T 

where A denotes the pseudo inverse (A A„) A . However, from [10] we see 
n n n n 

T -1 - T 

that (A A ) can be computed recursively. Let B denote A A . Then 
n n n n n 


, n+1 . .T -1 . B~ 1 z n+1 z n+1 B“ X 

B''\ = ( l zV ) s B" 1 - Ji = 2— 

i=l , . n+1 „■! n+1 


( 8 ) 


IB. tl I 4. *, A t 

1+z B z 
n 


n 


Since Z n = Ad 0 = £ z^, , we see that the optimum c 11 is given by 

n « « i 


c ntl = B'i, Z n+1 


B -1 n+1 n+1 D -1 
, B z z B 
-1 n 


- 1*1 

i 

n r 

— Un, 


n+1 n+ll 


f \ 


Honco, B^ and 7r are efficiently computed recursively, and c n+1 is computed in 
terms of these functions and the latest data, i.c., b n+1 and z n+1 , 

The error £ n = 1 1 A n c n -b n 1 1 2 can also be computed efficiently. Since 


n+1 


n+1 


n+1 


- V h 2 9 / V v M n+1 . n+1 

•n+1 “ J, b k " 2 V ) c + .1 > 


k=l 


) 


k=l 


k*i 


it is trivial to show that 


f n+1 = Hi> n+1 H 2 - = ||b n+:L || 2 - Z n+lT c n+1 (10) 


where !ib n+1 ii 2 , z n+1 , are all trivially recursively updatable so tha* 


£* n+1 is com P uted in tern,s of 1 1 b n 1 1 2 , Z 11 , B" 1 , b n+1 , and z n+1 . Note that almost 
all of the computational effort for c n+1 and ^ is accounted for in the compute 


tion of B. and then Br*,Z n+1 . Since this computation is needed for the deter- 


t.+l “““ n+1 

mination of both c n+ ^ and . , it follows that once one computes one of these 


estimates, the other is computed cheaply. 


Computation Costs 

Since recursive estimation schemes are under consideration, we estimate 
the CPU time for one such recursion. We deal with four costs, these being: 

(1) cost of computing c 11 ^ 

(ii) additional cost of computing 

(iii) cost of predicting best initial search point for new data point search 

(iv) new data point search cost given first search point. 

All costs are estimated by determining the numbers of various types and associated 
CPU realization times for arithmetic. Boolean and memory management operations. 

CPU realization times are for assembly language programs and are taken from Til]. 
The instruction list and associated realization times are given in Table 1. 



Table 2 contains a list of the numbers of the various instructions determining the 
four aforementioned costs and the resulting costs. A short explanation of the 
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details of these calculations is the following. The inversion of is carried 


out by first computing B~*z n+ \ This vector is then used in the computation of 


(8). Since z nTA = ^» x n +i^ v; ^ en a straight line, only two multiplications 

are required in computing B~^z T1+ ’ L in this case. Since B n ^z n+ ^ z n+ ^ B^* is 


symmetric, only the diagonal and half the off-diagonal entries must be calculated. 


Finally, dt is not necessary to divide each component of B^z I1+1 z n+ ^ by 


T T 

l+z r * +1 B~^z n+ ^. Dividing the vector B n ^z n+ ^ by l+z n+ ^ B n ^z n+ ^ and then computing 


[(l+z tl+lT B“ 1 z I1+1 )“ 1 B" 1 z n+:L ]z n+1 B -1 is more efficient, 
n n n 


Now consider costs (iii) and (iv). For model 1, a choice is made for x n+ ^ 
and then a search made for a data point having this x n+1< If the data is complex 
consisting of more than one noisy line and arc, the data point of interest is 
that closest to the line determined by c n . If there is only one line and no arcs 
presented, the only consideration is a least cost search for the data point and 
such a search, for data generated by Gaussian perturbations of a straight line, 

.A 

involves a good initial estimate y n+ ^ followed by successive examination of cells 
above and below those last explored. The necessary estimate in the first case, 
which is also the minimum mean squared error estimate to be us&d in the second 
case, is 

_ n+l T n / 1 \ 

y ,, = z c . Ill) 


n+1 


In practice, x n+ ^ would probably be chosen such that the differences x n+l ” x n 
are equal and this constant chosen at the start of the search. For model 2, the 
preceding considerations also apply. The initial choice we consider in the 
search for a z is a point z on the curve 


T 

*n+l n 
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Such a point might be found by choosing an x n+ ^ or a y n+ ^ and then solving (12) to 
.determine the other and, hence, z n+1 . The search for z n+1 is then probably most 
cheaply carried out by determining and then searching in the direction perpendicular 
to the curve at z n . Alternative less accurate choices for y and z might 
be considered, although the saving in costs for computing these estimates is small 
compared with the total computational cost of one recursion, and, hence, not a 
significant consideration. For example, the cost of calculating y n+1 = z n+1 c 11 is 
essentially the cost of one multiplication. 

For the 1st model, the search for ( x n+1 >y n+ i) is the sequence ( x n+ i»y n+ i +A ^» 
(x »y .-A), (x ,y .+2A), etc., where A is the quantization interval. Under 

HtX TlTX *1 i I1TX 

, - T *2* 

the assumption that y n+1 -z n+X e is a Gaussian r.v, , with c = ^ c Q » c i^ tlie true P ara- 

2 

meter vector, the specified search is of lowest cost.* Assuming the variance a of 

2 

v „“(c +c, x ,) given x , is much created than A , ‘an estimate of the expectation 
J n+1 o 1 ntl n+1 

of the number of iterations to finding y n+ ^ is obtained as follows. y n ^ and 
T 

y ^ = z n+1 c n are independent Gaussian random variables with common expectation 

n+ -|T o * 

z c. The variance of y n+1 is of course, c , and that of y n+1 is 

var c n t (x , ) 2 var c" [12]. Note that by c n and c” we mean the first and second 
o n+l l ox 

components of c n . Hence, 

var Vl * + (x n.l - 31,3 °ntl S V ” ,< WW 


■ var + ™ y„+i ■ + (x n+1 


- 9 r - 2 — 

x ) / / (x.-x ) ]. (x^ denotes 
n • * i n n 

i=l 


i 

n~ A £ x..) The expected number of iterations is 

i=i 1 

p{y n+r y n+i = 1} + 2,p{y r,4.i-y« +1 s “ 1} + 3 * p <y„ + i- y r, + i s 2 } + ... 


r ntl J n+1 


which is roughly 


2 l 2iA(2ma 2 ) exp[-(l/2c 2 )i 2 A 2 ] “ 


4(2ir)~ 1/2 c n+1 /A 


for o/A » 1, Mote that a/A » 1 => ° n +i' A >:> 1 wh* 0 * 1 then implies that the 

a * 

probability of the events y n+ ^ = y n+ i + ^ anc * ^n+1 = ^ntl ~ aro oac ^ 
approximately A(2Tic n+1 ) exp[-(l/2c n+1 )i A ] and the sum in (13) can be 
approximated by an integration. When is a constant for all i $ n, 

°n+l s ° 2<n3 + 12n2 + 20n + 4 >/< n3 + 8n 2 ) . (14) 

Then, the expected number of iterations is roughly 

4(2ir)" 1/2 (o/A)[(n 3 + 12n 2 + 20n + 4)/(n 3 + 8n 2 )] 1/2 . (IS) 

- 1/2 

As n ( 15 ) converges to 4 ( 2 ir ) (o/A) and is indeed close to this value 

for n 5 5. 

From Table 2 note that the sum of costs (i), (ii), and (iii) is roughly 
six times as great for a quadratic arc as for a straight line. 

Variability In Straight Line Approximation 

T n 

The variability in the fitted line y = z c depends on the number and 
specific choice of the x. 's. In order t< make an intelligent choice for the x^ 
a measure of fitted line variability is needed. Under the assumption that the 
y^ are perturbations of some line c q + c^x. , a useful measure of fitted line 


variability is 


,T , T n T T _ . 

tT>rt In 1 *2 . n n n *2*,, 1/2 

{E[(z c - z c) + (z c - z c) J) 


which is the square root of the sum of the variances of the y components of the 
fitted line endpoints. (The assumption here is that (c^J S 1; otherwise, the 
variances in the x components of the fitted line endpoints are of interest.) 

The evaluation of (16) is like that of a ^ and is easily shown to be 


r. \2.1/2 


o{2n”' L + [(x-x; + (x-x ) j/ l (x.-x r} 

I n n n . i n 


which reduces to 


and is 


o[2n“ 1 + 2<x-x ) 2 / l <x.-x ) 2 ] 1/2 , 
n n x n 


c[2(4n 2 + 2 On + 4)/(n 3 + 8n 2 )] 1/2 


when x.-x. , is constant for all i £ n. A more useful measure of variability is 
x x-i 

(16)/L or (14)/(x ~x, ) where L is the length of the approximating line. Since, 
n x 

for the parameterization under consideration, the slope of the approximating line 
has magnitude less than 1, the two measures are essentially equal. Hence, for 
convenience, we shall use 

Co/(x .-x 1 )][2n“ 1 + 2(x -x ) 2 / | (x.-x,) 2 :) 372 (19) 

n l n n x n 

or 

Co/(x -x, )][2(4n 2 + 20n + 4)/(n 3 + 8n 2 )] 1/2 , (20) 

n l 

when the x^-x^ ^ are equal for all i, as our measure of straight line fitting 
variability rr "error". The advantage of (19) over (17) is that division of (17) 
by x^-x^ appropriately normalizes the measure of endpoint variability. That is, 
as long as the standard deviation of the endpoint of the approximating straight 
line iu much smaller than is the approximating line length, we have a useful 
approximation and are justified in characterizing the error or the fitting 
variability as being small. 


Variability In Straight Line Approximation As A Function of Computation Time 


Since most of the computation time in identifying and approxxmatxng, by a 
parameterized line, a straight line or an arc is spent in the approximating process 
we examine the relationship between curve fitting error and computation time. We d 
this, however, for straight lines only, and furthermore, for x^ is such that x.-x^_ 
is a constant for all i. Consequently, for the error measure we use (20). Since 
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lino fitting is realized recursively with one recursion for each data point used* 

the total computation time is the sum of the computation times for each recursion. 

There are three co; __ utation costs which are the same for each recursion , and a 

fourth which is a function of the number of recursions previously completed but 

which becomes essentially constant after four or five recursions. We denote the 

first 3 costs per recursion as t , t , and t with t the cost of computing the 

c e p c 

estimate of the parameter vector c, t g the cost of computing the sum of the squares 

of the distances from data points used to the latest best line fit , and t the 

cost of computing the best first search point for the new data-point search. The 

fourth cost, t (i), is the cost of searching for the i~th data point given the 
s 

best starting point. As mentioned, t g (i) is essentially constant for i 5 5. 

If n data points are used in the fitting process, the fitting cost is roughly 

n(t c + t + t ) + l t (i) ’ (21) 

c e p i~l 


which is approximately 


n(t t t + t + t ) . 
ceps 


(22) 


Before evaluating (21) and (22), we make a few observations on the relationship 
of error to cost. Among the situations which can be encountered are two extremes: 
(i) t c + t e + t » t s , and (ii) t ft + t fi + t p « t g . Suppose (i) is true. Then 
n is directly proportional to fitting cost. Since error varies roughly as a 
constant multiplied by n** 1 ^ 2 , we see that error varies as (cost) multiplied 
by a constant and this constant is uniquely determined by the parameter o/Cx^-x^). 
Suppose (ii) is true. Then for n » 5, a reasonable approximation to fitting cost 

is nt g . However, since t is directly proportional to c, then again using the 

“ 1/2 

fact that error is roughly constant *n , we conclude that error is roughly 

constant (a 3/,2 (x -x, ) _1 ) (cost) -1/ ^ 2 . Hence, error then is a function of the 
n x 

parameter o ' /(x^-x^). A typical curve of error versus cost is that of Fig. 3. 


10 


Stopping Procedures 

The process of fitting a line or a quadratic curve to data was treated 
in the preceding sections. Two questions remain. First, how should the computer 
make a decision as to which of a line or quadratic curve best fits a subset of 
the data, and how useful is this data subset to the recognition of the entire 
pattern? Second, how does the computer decide, during the sequential fitting 
process, when to terminate the fitting of a line or quadratic curve because the 
latest data being precessed is that for a new line or curve? Under the assumption 
that good line and quadratic curve fits are possible, i.e., parameter estimation 
error is small for estimates based on a reasonable number of data points, stopping 
decisions can be made at costs smaller than the curve fitting costs. In this case, 
attention need be given only to the first question. If a curve fit is carried out 
with large estimated variance because an insufficient .number of data points is 
used or is available, then terminating the fitting of a curve could be rather costly 
and could require processing much of the data used in fitting the intersecting 
curve which comes next. We assume that the former case pertains. Thus, the 
problem to be considered is to decide as early as possible whether the data used 
in a curve fitting is that for a line or for a quadratic curve. How does previ- 
ously collected data affect the decision? Roughly for cases (i) and (ii) treated 
in the following paragraphs, based on the previously collected data or on some 
function of it, we have a conditional joint distribution for whether the under- 
lying curve is a line or a quadratic and for the associated unknown parameters . 
Denote this density p(H,c[w) where H denotes the hypothesis and takes values H q 
and for line and quadratic, respectively, and c is the parameter vector for the 

curve type for the associated hypothesis, w denotes previously accumulated or 
reduced data. Given w, the joint likelihood of H,c and the most recently acquired 


data y. , y # . , . . . , y to which we are interested in fitting a curve is 
x n 

p(H,c|w)f(x 1 , . x n |H,c) (23) 

where we are assuming the data as perturbations of the underlying curve is inde- 
pendent of w given H,c. 

For reasons discussed in the introduction, the stopping rules we briefly 
examined are those involving maximum likelihood ratios. Also, in general we are 
apt to be interested in a multihypothesis problem involving a decision as to which 
of a number of straight lines or arcs best fits a data subset entering a local 
region. Such a decision will be made by comparing each of one or more discriminant 
functions with a threshold where a threshold will depend on the discriminant func- 
tion being tested and may depend on the number of data points and the extent of 
the region over which they are taken. (See fl3] for some comments on abstract 
multihypothesis sequential decision theory.) In practice, the thresholds would 
probably be determined partially through experimentation. Model (3) is most easily 
studied and an understanding of its stopping behavior should provide a rough under- 
standing of the stopping behavior of (4). We consider three versions of the simple 
problem of deciding whether the data constitutes a noisy perturbation of a linear 
function of x or of a quadratic or perhaps less restricted function of x. The 
three versions represent three different assumptions of prior structural knowledge 
of the underlying curves in x. Problem (i). We assume the underlying curve is 

y = c + c,x under H and is y = c + c,x + c„x with c 0 known under H. . 

-'ol o -'ox 2 2 x 

c and c, are different under H and , are unknown and may enjoy large ranges 
of possible values. The assumption of c 2 known is made in order that we be able 
to devise a test to distinguish between linear functions of x and functions of 
x which have at least some minimum quadratic content of interest. If the true c ^ 
is in fact larger in magnitude than the c for which the decision function is 
designed, then the decisions made will have lower error probability than that 
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designed for. Problem (ii). *IV/o hypotheses as in (i), but c 2 is not considered 
known. Problem (iii). A linear function is assumed under H q , and no assumption 
is made concerning the form of the underlying curve under Hence, the assump- 

, tions range from somewhat restrictive in (i) to very unrestrictive in (iii). 

The decision pi cedure then for (i) or (ii) is 


max p(H 1 ,8|w)f(y 1 ,...,y n |H 1 ,B) 

H if J > 1 

max p(H ,‘y|w)f(y 1 ,... I y n |H o ,y) 

y ' 

H otherwise . 
o 


(24) 


Equivalently, the log of the left side of (24) can be compared with 0. In 
practice, p(H,c|w) will be of use only if it is of very simple form. This most 
likely restricts it to being Gaussian in its dependence on c or a function which 
takes constant values over a few simple regions in H,c space. For example, a 
particularly simple function. Fig. 4, might treat all lines having leftmost ends 
close to vertical A and right most ends in the shaded region as equally likely . 
The function might have one or more such regions of fairly high probability and 
the remainder of the space have likelihood 0. 

A convenient form for viewing (24) is the following. Under our white 
Gaussian noise perturbation assumption, p(H Q ,Y|w)f(yj L> . . . ,y n |H <j ,Y) is 

p(H ,Y|w)(2iro 2 ) _n/2 expC-(l/2o 2 ) £ (y £ - Y Q - Y^) 2 ^ < 25 > 

° i=l 

which can be reformed as 

p(H o ,Y|w)exp{-(l/2cr 2 ) l [c q - y q + - Y^^l 2 ) * 

(2ma 2 )" n/2 exp[-(l/2o 2 ) £ (y. - <= - c x ) 2 ] 

i«l 1 


( 26 ) 


where c Q and arc the parameters for a least squares lino fit to the data 

y.,...,y . Hence, the influence of w on the maximization of (26) with respect to 
j* n 

y appears in the first line of (26). The second line of (26) involves only the 
latest data subset and the least squares line fit to that data. If the value of 
y at which (26) is a maximum is determined largely by the exponential in tho first 
line of (26), then p(H q ,y|w) in the denominator of (24) can be placed before the 
maximization. If p(H^,&|w) can similarly be removed from the maximization in the 
numerator of (24), the effect of w is simply to multiply 

max f(y 1 »...,y n |H 1 ,P) 

(27) 

max f(y x ,... ,y n |H o ,Y) 

by a constant. This situation is likely to occur when n is large. (This does 
not mean that w‘has not played an important role. It plays an important role in 
guiding the search for and then in weighting (27).) Consequently, we 

examine some properties of decision making based on (27). 

The x values x^, x 2 ,..., at which data points are sought are chosen first, 
based on prior knowledge of the lengths of lines or curves likely to be encount- 
ered. In practice, equal increments x.-x. . would probably be used and one can 
determine a best (in some sense) increment to use. We do not do that here. Ihe 
object of this section is to provide some insight into the structure of good 
decision rules for deciding with a minimum number of data point s whether the 
latest data subset is best fit by a line or a nonlinear arc. 

Case (i): For this case we employ a Wald sequential probability ratio 

test (SPRT) [14^ having the property that the decision for accepting H q or is 
made with predetermined probabilities of incorrect decision under H q and set 
by the designer, and this performance is achieved by processing the minimum 
expected number of data points. The parameter vector c n giving the best linear 
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fit to the data has previously been found to be (9), i.e., 


n 




If the quadratic approximation is used with known coefficient c 2 for the second- 
power term* the least squares estimate for (c q ,c^) is given by 


n 


C,Ii * B " ill Zl(yi " °2 X i ) ’ ° r 


n 


*» n s c n - B”' 7 '' r ~ i 


n 


X V i 2 
l Z c„x_. 


i=l 


Ti 


(28) 


Then a sequential maximum likelihood ratio test consists of comparison of the log 
of the maximum likelihood ratio, (27), with a pair of stopping boundaries. The 
log of (27) is (29) multiplied by a constant. 


(1/2) [- l (y. - zTcV + l (y. - zTc ,n .- c^x?) 2 ] (29) 

i=l 1 1 i=l 1 1 1 x 

(z.c and z.c ,n + c x7 are merely the approximations to y. provided by the linear 
and quadratic functions which are least squares fits to the n data points.) 
Equations (29) can be rewritten as 

- <4 c ’ n + C 2 x i^ 

-(1/2) l C(z7c n ) 2 - (zjc' n t C.x 2 ) 2 ] . (30) 

• ■ X X Z X 

1=1 

This is the usual operation of cross-correlation of the data sequence with a 
reference function and then the addition of a number not depending explicitly on 
the data. In a classical simple two hypothesis testing problem, c 11 and c ,n are 
not functions of the data. The reference function is then a deterministic function 
which is the difference of the mean value functions under the linear and quadratic 
hypotheses. Also, when dealing with simple hypotheses, the second summation is 
a constant, i.e., independent of the data. Since we are dealing with composite 
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hypotheses, the reference function is a function of the data, and the. f "ond 
► 

summation in (30) is also a function of the data* Thus, the decision function in 
(30) is an estimator-correlator. An interesting simplification of (30) can be 
made. Through substitution in (30) of (9) and (3), (30) reduces to 



(31) 


-1 r 2 

The interpretation of (31) is as follows: £ z i c 2 x i is the coefficient vector 

2 2 2 

for a least squares linear fit to the sequence CgX^ , c^Xg , ...» c^x^ . Thus , the 

T “1 r 2 

line having ith x,y values x^ and l z i c 2 x i * s w hich differs from 

the quadratic function c^x , at the points x^,...,x n , least in a sum of error 
squares sense. Consequently, the decision function reduces to that which would 
be used for testing the hypothesis that the data is a perturbation of c^x against 
the alternative that the data is a perturbation of the line which is most like 
c 2 x at the points , x 2 » . . . , x n . The linear hypothesis used here is the most 
conservative possible ! The test is intuitively reasonable since on the one hand, 
the user is only certain of the c^x portion of the quadratic hypothesis, and on 
the other hand, the user knows only that the alternative hypothesis is linear and 
a worst case hypothesis is therefore a safe one to use. The resulting error rate 
will of course be higher than if the parameters in the two hypotheses were known. 

The question that than remains is whether the data is such that the test performs 
as would a test for hypotheses which were truly simple. The answer is yes, and 
this is justified in the Appendix. 

In summary, then, the stopping test reduces to the test of a simple hypothe- 
sis against a simple alternative and the stopping thresholds are set in accordance 
with the classical theory [14'j for such hypotheses. 


1 


”/ ' r 'I'] 









t 

1 

! 


1 

j 

1 

* 

k 

\ * ' ‘ 


, - 24 - 

Case (iii): Case (i) was a stopping rule baaed on considerable a priori 

information concerning the problem probability structure, In Caso (iii) we 
function with minimal prior information. The mean valuo function is assumed to 
. be linear under hypothesis H and nonlinear under hypothesis H. . tto additional 
information is assumed about the mean value function under the alternative hypoth- 
esis. A standard approach then would use a chi-square test on the sum of the 

f 

squares of the errors in the best linear fit to the data. This would involve 
fixing n and a > 0 and deciding H q or depending on whether the sum of the 
squared errors in the best linear fit to n data points lay within or outside a 
1 - a confidence interval. The drawback of such a tost is that if the confidence 
interval is large and if is true, the probability of the decision being H q may 
be large, i.e., the power of the test may be small. If we wish to design the test 
to operate with some specified minimum power, then wo .are back to Case (i). 

A compromise is to choose n such that the length of the 1 - a confidence interval 
can be fixed at a value we consider to be reasonable for H q . Then if the confi- 
dence interval is not violated, we can be quite certain that H o is true. A reason- 
able choice for the length d of the confidence interval might be the following. 


Let e(n) denote (£ /n) 1//2 

n 

(sum of squared errors in linear fit to n data points/n) 


1/2 


That is, e(n) is the average error in the linear fit to a data point. A smooth 

curve of length i might be considered to be a line if it lay within a rectangle 

of length Z and width d with d/Z « 1. Since the data for our curves may exhibit 

considerable fluctuation, we shall consider n data points to have teen generated 

under H if 
o 


e(n) < d , 


( 32 ) 


Hence, we choose d such that any fairly smooth curve of length roughly i which 
lieB within a rectangle of size d by i is in appearance what we are preparod to 
call a lino. Since the variation in lengths of curves of interest may bo consid- 
erable, we choos« an appropriate number p and then determine d from 

dp = % . (33) 

Of course, a meaningful d must be greater than the noise standard deviation a. 

In summary, then, a donfidence interval length d is chosen in terms of specified 
p and l by (33) and n is then determined in order that a 1- a confidence interval 
under H q have this length. This provides us with a mechanism for being reasonably 

sure that if our decision is H . the associated mean value function is indeed 

o 

linear, and, furthermore , the probability a of deciding when H q is true, is 
small. We might choose l to be somewhat smaller than the a priori expected lengths 
of the lines which could be present or simply choose l to be of minimal size yet 
sufficiently large that n is not excessive. Finally, we mention that these results 
can be used as a guide for devising a sequential test in an obvious way if the 
a priori uncertainty in the possible value for t is great. 

Case (ii): We point out that prior knowledge of the nature of the mean 

value function under which is intermediate between that used in Cases (i) and 

(iii) can be assumed. Namely, we can assume that the mean value function under 

2 • 

H, is c + c,x + c_x with c„ unknown and arbitrary. Then the log maximum likeli- 
1 o 1 Z t 

hood ratio test becomes the comparison of 

0 " 2 tc nT A T b n - c ,n A’V] (34) 

n n 

with 0 where c 11 and A apply to the quadratic model z = (l,x,x ) , and c' and 

T 

A* apply to the line model z' = (l,x) . Since the test statistic is chi-square 
n 

distributed with 1 degree of freedom, the test can be designed for a specified 
probability of error when H q is true. However, if one is willing to assume a 
quadratic mean value function under H^, then one may as well treat the problem 
as Case (i) and we therefore do not investigate Case (ii) further. 
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Further Comment on Dependence of G oodness of Fit on Humber of Data Points 

•MMMMMMaMni I I » »>■»«*— mmm — — n '»«■' ' » ^ ^ 1 1 

For those applications where the joint likelihood of curve parameters 
and data is used, it is generally desirable that the functional dependence of 
this likelihood on the parameter vector be highly peaked in the vicinity of the 
true underlying parameter vector. Some insight into this dependence on number 
of data points is provided in this section. The joint likelihood function may 
be carried along as a performance functional in a sequential pattern recogni- 
tion problem and it also appears in the simpler problem of recognizing which 
of two underlying curves of the same degree (or families of such curves) is 
associated with the data. 

We discuss the case of an underlying straight line (the same discussion 
holds for a higher order curve). Assume the line is parameterized with y as 
a function of x. The joint density of line parameters and data is that of 

(25) and (26), The second exponential in (26) is not a function of c. 

Taking the natural logarithm of (26), we see that 

(-n/2)ln (2iro 2 ) - (l/2a 2 ) E (y. - c - c x. ) 2 

i=l 1 ° 1 

is not a function of y» and has well known behavior since 

« H AAA 

o" E (y. - c - c, x.) is chi sauare distributed with n-2 degrees of 
i=i 1 ° 1 1 

freedom. The remainder, 

n * 

In Cp(H o , y|w)D - (l/2o 2 ) E Cc Q - Y q + (c 2 - Y-l) x .)] 2 (35) 

* i=l 

is of course a function of y» and would be the only part of the logarithm of 

(26) entering into a test as to which of two underlying lines families of 


v i ’7 Tff 1 i'»i 
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lines best represented the data. Hence, such ' test might be thought of as a 
comparison of how well the maximum likelihood line 'fits the lines or families 
of lines associated with the two hypotheses. 

Denote by the true underlying parameter values. Thin 

- (l/2o 2 ) E | E £ c 0 ~ + <c x - Y x ) x^ 2 y t | 

'■i=l * 

.-(l/2 0 2 ) E e|ho - Y t + (c - Y lt > |v t } -(1/2C 2 ) E [Y 0 - Y ot + <Y X - Yj 

1=1 ' 1 ’ 1=1 


The nature of the dependence of (36), the mean of (35), on n shows up on the 
right of the equality. As a portion of a performance function, (35) becomes 
useful for those n for which it is small. I* can be seen that n should be 
at least as large as is necessary for the first summation on the right of (36) 
to be small in magnitude compared with the magnitude of the sum of 
ln[p(H o ,y|w] and the second summation on the right of (36). Mote that the first 
summation on the right of (36) is related to the measure of line fit used in 
the previous section and graphed in Fig. 3. The error measure used there is 
the first and last terms in the first summation on the right of (36). 


Experimental Results on Fitting Ellipses to Moisy Data 


Experiments were run using the proposed method for fitting quadratic arcs 
to noisy data which appeared to be appropriate for approximation by an ellipse. 

Two phenomena of interest occurred. First, the optimum quadratic curve 
was on occasion a hyperbola with each branch fitting a portion of the data. 

Since no constraint is imposed to restrict the parameters to those for an ellipse 
the fitted curve can be an ellipse. 
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a parabola, or a hyperbola. The parabola essentially never occurs, being a 
degenerate ellipse or hyperbola. A hyperbola will sometime result in smaller 
fitting error than an ellipse . . . especially if the sampling interval is not 
uniform along the curve and the data to be fit is somewhat clustered as in 
Figure 5. Since a single branch of a hyperbola is a satisfactory representa- 
tion for our purposes , but an approximation involving two branches is not , 
the problem is to force the best fitting quadratic to be an ellipse or single 
branch of a hyperbola without giving up the computational advantages of the 
linear regression methodology. Two approaches to this problem have been 
tried. The first is to take data points at reasonably equally spaced intervals 
along the curves being fitted, and then, if necessary, to introduce one or a 
few artificial data points at appropriate locations to force the desired type 
of curve without significantly influencing the parameters for the curve of 
desired type. The approach works in some instances. 

In the section on Models, we interpreted the data point error measure 
when the approximating arc is a portion of an ellipse, A similar interpreta- 
tion is possible when the approximating curve is a hyperbola. Note that the 
determinant of the C matrix in that section determines whether the curve 
(5) = 0 is that for a hyperbola or an ellipse. The curve is an ellipse when 
the determinant of C is positive and is a hyperbola when the determinant 
is negative. Suppose C defines a hyperbola and a data point of interest is 
v^ = (x^, y_, ) " shown in region I of Figure 6. Then, upon letting v^ denote 
the point of intersection of the line from v q to v^ with the hyperbola, we 
define 5 by 



S(v h 



(37) 
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For two data points each a small pe;. lendicular distance d from the hyperbola, 
the one furthest from v Q contribute, more heavily to the error measure, (4) 
or the square of (5). If the data point falls in region II In Figure 6, the 
contribution to (4) is interpreted as follows. If, in the equation defining the 
hyperbola of Figure 6, the coefficients of x and y are interchanged, the 
resulting hyperbola, shovm in dashed lines in the figure, will be referred to 
as the complementary hyperbola. Let be the data point shewn in region II, 
and denote by v the intersection with the complementary hyperbola of the 
line from v q to v^. Then upon defining 6 to be the complex scalar satisfying 

v d - i v c = « i v c , (38) 

(5) again can be reduced to (6). Note that for (37) and (38), (6) can be 
rewritten as 

(x d /Xh) 2 - 1 , equivalently, (yd^h^ " 1 O 9 ) 

and 

-(x d /x c ) 2 - 1 , equivalently, -(yd/y c ) 2 - 1 , (40) 

respectively. If, for example, the desired fit to data lying abc - . ' the x axis 
is an ellipse with major and minor axes more or less parallel to the x and 
y axes, but the algorithm has fit a hyperbola as in Fig 5 instead, the introduc- 
tion of a few appropriately placed artificial data points along the line of 
symmetry lying outside the two branches of the hyperbola (y axis in Fig. 6) 
will usually then force the best fitting curve to be an ellipse. Good locations 
for the artificial data points are easily chosen based on (6), for the ellipse, 
and (39) and (40). The results hold for hyperbolas of abritary orientation. 

Figure 7 is an example of a single branch hyperbolic fit resulting simply 
from the use of data points taken at more uniform intervals along the curve 
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A second and generally successful approach to the problem is the addi- 
tion of a penalty function to (4) if the Ins' fitting curve consists of portions 
of two branches of a hyperbola or of an exceedingly small ellipse. For c^, c 2 , 
c 3 as in (4), the penalty function used was 

N(c 2 2 - 4 0jl c 3 ) (41) 

2 

for (c 2 - 4 ^ c 3 ) > 0 and where K is a positive large constant, large compared 

2 

with the discriminant function (c 2 - 4 c^ c^). With (4) augmented by (41), 

the equations for the optimal c are still linear. Note that for H large, if 
2 

c 2 - 4 c 3 > 0, (41) can impose a large penalty on (4). Hence, the use of 

2 

this penalty function tends to drive c 2 - 4 c^ c^ to 0, Unfortunately, this 
can happen by forcing all of c^,c 2 ,c.j to be small, and, in fact, this is what 
happens in practice. That is, an appropriately shaped and sized ellipse seems 
to be prevented by use of (41), and instead the best fitting curve is forced to 

i 

be a single branch of a hyperbola with the other branch pushed out toward 
infinity for large N. This is a satisfactory solution for many purposes. In 
practice, if (41) is required for a set of n data points, the inversion of 

at that stage cannot be carried out recursively in terms of B n _^. However, if 

-1 -1 -1 
N is then held fixed, B n+ ^, B n+2> etc. can be computed recursively from , 

A reasonable procedure then is to compute two sequences of c n, s, one with 

a penalty function and fixed N and one without penalty function. When the one 

without settles down to that for an elliptic curve fit, the sequence based on 

the penalty function can be discontinued. 

The curves in Fig. 8 illustrate the use of (41) in the sequential fit of 

quadratic curves to data. Data points are found in sequence starting at the 

upper y-axis and proceeding through the second quadrant down through the third 


quadrant. The points were generated as Gaussian perpendicular perturbations of an 
ellipse. The basic sequential algorithm (9) fits hyperbolas to the first sets of 
6 and 7 data points. Both branches are involved in a fit. By use of (41), 
hyperbolic fits involving only a single branch result. These single branch curves 
are shown in the figure. The same N was uced in both. The choice of N affects 
the separation of the two branches of the resulting hyperbola. Furthermore, since 
single branch hyperbolic fit to the data usually involves that portion of the 
branch of greatest curvature, N has an affect on the shape of the resulting fit. 

He see that the fit to six data points is flatter then it really should be thus 
indicating that a smaller N should have been used which would not have pushed the 
two branches as far apart and would have permitted the branch used to have greater 
curvature. Beginning with the ninth data point, the basic algorithm (9) fit on 
ellipse. Note that beginning with the fit to seven data points, the resulting 
curves are useful for locating the next data point. 

The second phonomenon requiring comment is the following. Suppose data in 
the vicinity of the origin is best fit by an elliptical arc. Then if the data is 
translated in the x and y directions by x' and y’ , respectively, the elliptical 
arc which best fits the new data is not the translation by (x^y*) of the elliptical 
arc fitting the original data. More specifically the dependence of (6) on p, which 
is a function of v Q and C, leads us to understand that the best fitting ellipse to 
data removed from the origin will have major and minor axes rotated and scaled dif- 
ferently than would result were the data in the vicinity of the origin. Figure 9 
contains overlays of the elliptical arc fits for data at the origin and for the same 
data translated by (25,25). The solution to the removal of this distortion is to 
produce a rough elliptical curve fit to the data removed from the vicinity of the 
origin, then subtract 
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the coordinates of the center of the rough-fit ellipse from the data points 
to be used in the fitting process, and then proceed as usual to curve fitting 
on the translated data. Analogous statements apply if single branch hyperbolic 
fits are involved. 

Conclusions 

For the transformed-data linear regression approach to quadratic curve 
fitting studied in this paper, we interpreted the measure of curve fit error 
minimized by the procedures. This error is not the sum of squares of the 
shortest distances between data points and the approximating curve. Nevertheless, 
the experimental results indicate that the resulting fit is visually satisfactory, 
our error is small when the standard perpendicular distance error is small, and 
the measure should be perfectly satisfactory for almost any use. 

For straight line fitting, we showed that the data search cost can be an 
appreciable portion of the total line-fitting cost. The data search cost here 
was minimal becuase of the tacit assuiyf^ion of data connectedness so that a data 
search along a straight line would eventually locate a data point. When such an 
assumption is no longer true, the data search would have to be over a region 
rather than along a line and would be more costly. 

Though the basic curve- fitting algorithm is fast and requires little 
computation, our experimental results revealed occasional undesired behavior such 
as the fitting of two branches of a hyperbola rather than an arc of an ellipse, 
and distortion in ellipse orientation and shape when the data is noisy and far 
removed from the origin. Note, neither problem arises if the data contains little 
noise and the underlying curve is an ellipse. However, when the problem does 
occur, we found that a satisfactory curve fit involving a single-branch of a 
hyperbola could be forced through use of a penalty function, and as the curvature 


in the data being processed becomes apparent, the original algorithm without 
penalty function begins to fit an ellipse. The distortion problem was also 
handled simply as discussed in the section on experimentation. 

Curves relating straight-line fitting costs and fitting error were prepared. 

- 1/2 

Asymptotically, error varies as (cost) as expected. It is assumed here that 
the length of the data interval being treated is fixed and the option under in- 
vestigation is the choice of number of data points. The curves shown are para- 
meterized by two parameters: noise standard deviation divided by cell size, ct/A, 

and by noise standard deviation divided by the x interval (or y interval) over 
which the approximation is made, a /^ x n “ x i)* The curve parameter computation costs 
are derived for straight lines and quadratic curves. It is seen that the latter 
costs are roughly six times the former. If noisy data search costs are included, 
the ratio probably will not change much since the line fitting cost may increase 
by up to 50%, but the quadratic curve fitting cost may almost double due to the 
peculiarities previously noted. Thus, quadratic curve fitting is relatively ex- 
pensive. However, for the purpose of recognition of highly variable pictures, 
curve fitting is often necessary and quadratic curve fitting may be much more 
useful than piecewise linear fitting followed by polygon recognition and manipu- 
lation. Finally, since recognition can often be accomplished using other tech- 
niques, e.g., by comparisons, not involving multiplications or divisions, with 
stored templates, in any recognition cost evaluation an effort must be made to 
include all significant costs and not just an indication of numbers of multipli- 
cations and divisions. Our fitting costs could be reduced by exploiting the fact 
that the intervals used would often be constant and by using truncated 

variable values and thus perhaps replacing multiplications and divisions by opera- 
tions requiring less computation time. 

The data modeled in this paper consists of thin lines. Thick line 
drawings can be handled in a number of ways using the methods of this paper or 
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natural variations. The simplest modification, which should be satisfactory 
in a variety of applications, would be to replace the data points in a perpen- 
dicular slice across a thick line by the center point and then apply the methods 
of this paper to the transformed data. More efficient methods or more sophisti- 
cated methods can be developed depending on how one wishes to model the data 
generation. 
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APPENDIX 

We show here that the test (31) behaves as if the two hypotheses are 
simple. The Gaussian distributions under these simple hypotheses have mean 

w irt w. j. 

vectors (c 2 x i* C 2 X 2 , **' , C 2 X n^ an( * ^ z l c ' ’ Z 2 C ' »**•» z ^ c ' )» respectively, 
where the latter is a least squares linear fit to the former and 

T n * 

c ,n = (c ,n , c« n ) T is given by -B~^ l z*c 0 x? . Hence, all the 
o.l n i=1 1 i 

components of the mean value vector for the latter hypothesis are functions of 
n. Justification for the conclusion that the test performs as though the 
hypotheses are simple is as follow. We first show that the linear portion of 
the mean value function of y^ in the test statistic (31) does net contribute to 
the value of (31). Denote by r n the. n vector having i-th component 

z.B l z.c 9 x.* , and by s the n-vector having i-th component c_x. . Because r i 
x n • 3 — D x 

J-X 

a linear; least squares fit to s n , the sum £ (r^ - s^j?) is 0 and the inner product 

i=l 

of r n with r 11 - s n is 0. Since the linear portion of the mean value of y can 

be written as a constant vector plus a scalar multiple of r 11 , it follows that 

the inner product of the linear part of the mean value of y n with r n - s 11 is 0. 

Hence, (31) would have exactly the 1 . same value if y 11 had zero mean value vector 

under H q and mean value vector (cji*^,. . . ,c 2 x n ) under H^. Indeed, (31) would 

have exactly the same value if the mean value of y 11 were r n under H q and 
T 

(c 2 Xi,...,c 2 Xn)* under H 1< But (31) is the SPRT for these hypotheses, thus 
concluding the proof. 
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TABLE 1 


Assembly Language Instructions and Average 
Realization Times for IBM 360/65 


Average Approximate 


Instruction 

Mnemonic 

Realization Time (v secs) 

Add memory to register 

A 

2 

Add reg. to reg. 

AR 

1 

Branch on index low or 



equal reg. to mem. 

BXLE 

2 

Compare logical 

CLR 

1 

Divide reg. to reg. 

DR 

9 

Load from mem. to reg. 

L 

1 

Load multiple 

. LM 

1 + R/2* 

Load reg. to reg. 

LR 

1 

Multiply reg. to mem. 

M 

5 

Multiply reg. to reg. 

MR 

4 

Subtract mem. to reg. 

S 

2 

Shift left single logical 

SLL 

1 

Subtract reg. to reg. 

SR 

1 

Shift right single logical 

SRL 

1 

Store from reg. to mem. 

ST 

1 

Store multiple 

STM 

1 + R/2* 


* R is the number of registers stored or loaded. 




i. 





TABLE 2 


Numbers of Major Operations and Total CPU Times Per Recursion for 
Computing c 11 * e n , y n for Straight Lines and Quadratic Arcs 


Quadratic Arc 


Straight Line 


ntl 


n+1 


86 AR+ 5 DRt 7 4LR+ 80Mt 8 3 SRL 
+ others = 800p secs. 

4A+5M+other?i u 41p sees. 



3AtDR+SM+5SRL + others 
= 52y secs. 


search 
i pts 


5AR+2DRtl2LRtlOMR+llSRL 
+ others s 120p secs. 

3AR+3MR+3SRLtothers « 20y secs. 


2A+M+L+SRL = 10p secs. 
8+(i-l)3y secs. 


Programs are written using fixed point arithmetic and it’s assumed 
that all 16 32-bit registers are available. 
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Figure Captions 

Figure 1 Illustration Of The Error Measure In Expression (*0. 

Figure 2 Illustration Of The Pcrmeters In Expression (6), 

Figure 3 Fitting Cost (cpu time) As A Function Of Error Standard Deviation 
For Straight Line Fitting To Noisy Data. 

Figure 4 Region Of Constant Prior Probability Density For Underlying Lines 
For Succeeding Data. 

Figure 5 TVo- Branch Hyperbolic Fit To Noisy Perturbations Of An Elliptic Arc. 

Figure 6 Illustration Of The Error Measures In Expressions (39) And (40), 

Figure 7 Single Branch Hyperbolic Fit To Noise Perturbations Taken At Roughly 
Equal Intervals Along An Elliptic Arc. 

Figure 8 Sequential Curve Fitting Using A Penalty Function To Force Single 
Branch Hyperbolic Fits In The Initial Stages. 

Figure 9 Overlay Of Elliptic Fit To Data Near The Origin And Elliptic Fit 
To The Same Data Translated From the Origin. 
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